{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "fb69907a",
   "metadata": {},
   "source": [
    "# Returning Structured Output\n",
    "\n",
    "This notebook covers how to have an agent return a structured output.\n",
    "By default, most of the agents return a single string.\n",
    "It can often be useful to have an agent return something with more structure.\n",
    "\n",
    "\n",
    "A good example of this is an agent tasked with doing question-answering over some sources.\n",
    "Let's say we want the agent to respond not only with the answer, but also a list of the sources used.\n",
    "We then want our output to roughly follow the schema below:\n",
    "\n",
    "```python\n",
    "class Response(BaseModel):\n",
    "    \"\"\"Final response to the question being asked\"\"\"\n",
    "    answer: str = Field(description = \"The final answer to respond to the user\")\n",
    "    sources: List[int] = Field(description=\"List of page chunks that contain answer to the question. Only include a page chunk if it contains relevant information\")\n",
    "```\n",
    "\n",
    "In this notebook we will go over an agent that has a retriever tool and responds in the correct format."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4fc33ba5",
   "metadata": {},
   "source": [
    "## Create the Retriever\n",
    "\n",
    "In this section we will do some setup work to create our retriever over some mock data containing the \"State of the Union\" address. Importantly, we will add a \"page_chunk\" tag to the metadata of each document. This is just some fake data intended to simulate a source field. In practice, this would more likely be the URL or path of a document."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e0b62a8e",
   "metadata": {},
   "outputs": [],
   "source": [
    "# pip install chromadb"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "4ea20467",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.document_loaders import TextLoader\n",
    "from langchain.embeddings.openai import OpenAIEmbeddings\n",
    "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
    "from langchain.vectorstores import Chroma"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "e3002ed7",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load in document to retrieve over\n",
    "loader = TextLoader(\"../../state_of_the_union.txt\")\n",
    "documents = loader.load()\n",
    "\n",
    "# Split document into chunks\n",
    "text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
    "texts = text_splitter.split_documents(documents)\n",
    "\n",
    "# Here is where we add in the fake source information\n",
    "for i, doc in enumerate(texts):\n",
    "    doc.metadata[\"page_chunk\"] = i\n",
    "\n",
    "# Create our retriever\n",
    "embeddings = OpenAIEmbeddings()\n",
    "vectorstore = Chroma.from_documents(texts, embeddings, collection_name=\"state-of-union\")\n",
    "retriever = vectorstore.as_retriever()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6ec1c106",
   "metadata": {},
   "source": [
    "## Create the tools\n",
    "\n",
    "We will now create the tools we want to give to the agent. In this case, it is just one - a tool that wraps our retriever."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "204ef7ca",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.agents.agent_toolkits.conversational_retrieval.tool import (\n",
    "    create_retriever_tool,\n",
    ")\n",
    "\n",
    "retriever_tool = create_retriever_tool(\n",
    "    retriever,\n",
    "    \"state-of-union-retriever\",\n",
    "    \"Query a retriever to get information about state of the union address\",\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9af5b61b",
   "metadata": {},
   "source": [
    "## Create response schema\n",
    "\n",
    "Here is where we will define the response schema. In this case, we want the final answer to have two fields: one for the `answer`, and then another that is a list of `sources`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "2df91723",
   "metadata": {},
   "outputs": [],
   "source": [
    "from typing import List\n",
    "\n",
    "from langchain.utils.openai_functions import convert_pydantic_to_openai_function\n",
    "from pydantic import BaseModel, Field\n",
    "\n",
    "\n",
    "class Response(BaseModel):\n",
    "    \"\"\"Final response to the question being asked\"\"\"\n",
    "\n",
    "    answer: str = Field(description=\"The final answer to respond to the user\")\n",
    "    sources: List[int] = Field(\n",
    "        description=\"List of page chunks that contain answer to the question. Only include a page chunk if it contains relevant information\"\n",
    "    )"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2cd181df",
   "metadata": {},
   "source": [
    "## Create the custom parsing logic\n",
    "\n",
    "We now create some custom parsing logic.\n",
    "How this works is that we will pass the `Response` schema to the OpenAI LLM via their `functions` parameter.\n",
    "This is similar to how we pass tools for the agent to use.\n",
    "\n",
    "When the `Response` function is called by OpenAI, we want to use that as a signal to return to the user.\n",
    "When any other function is called by OpenAI, we treat that as a tool invocation.\n",
    "\n",
    "Therefore, our parsing logic has the following blocks:\n",
    "\n",
    "- If no function is called, assume that we should use the response to respond to the user, and therefore return `AgentFinish`\n",
    "- If the `Response` function is called, respond to the user with the inputs to that function (our structured output), and therefore return `AgentFinish`\n",
    "- If any other function is called, treat that as a tool invocation, and therefore return `AgentActionMessageLog`\n",
    "\n",
    "Note that we are using `AgentActionMessageLog` rather than `AgentAction` because it lets us attach a log of messages that we can use in the future to pass back into the agent prompt."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "dfb73fe3",
   "metadata": {},
   "outputs": [],
   "source": [
    "import json\n",
    "\n",
    "from langchain_core.agents import AgentActionMessageLog, AgentFinish"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "5b46cdb2",
   "metadata": {},
   "outputs": [],
   "source": [
    "def parse(output):\n",
    "    # If no function was invoked, return to user\n",
    "    if \"function_call\" not in output.additional_kwargs:\n",
    "        return AgentFinish(return_values={\"output\": output.content}, log=output.content)\n",
    "\n",
    "    # Parse out the function call\n",
    "    function_call = output.additional_kwargs[\"function_call\"]\n",
    "    name = function_call[\"name\"]\n",
    "    inputs = json.loads(function_call[\"arguments\"])\n",
    "\n",
    "    # If the Response function was invoked, return to the user with the function inputs\n",
    "    if name == \"Response\":\n",
    "        return AgentFinish(return_values=inputs, log=str(function_call))\n",
    "    # Otherwise, return an agent action\n",
    "    else:\n",
    "        return AgentActionMessageLog(\n",
    "            tool=name, tool_input=inputs, log=\"\", message_log=[output]\n",
    "        )"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6d7401a1",
   "metadata": {},
   "source": [
    "## Create the Agent\n",
    "\n",
    "We can now put this all together! The components of this agent are:\n",
    "\n",
    "- prompt: a simple prompt with placeholders for the user's question and then the `agent_scratchpad` (any intermediate steps)\n",
    "- tools: we can attach the tools and `Response` format to the LLM as functions\n",
    "- format scratchpad: in order to format the `agent_scratchpad` from intermediate steps, we will use the standard `format_to_openai_function_messages`. This takes intermediate steps and formats them as AIMessages and FunctionMessages.\n",
    "- output parser: we will use our custom parser above to parse the response of the LLM\n",
    "- AgentExecutor: we will use the standard AgentExecutor to run the loop of agent-tool-agent-tool..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "73c785f9",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.agents import AgentExecutor\n",
    "from langchain.agents.format_scratchpad import format_to_openai_function_messages\n",
    "from langchain.chat_models import ChatOpenAI\n",
    "from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder\n",
    "from langchain.tools.render import format_tool_to_openai_function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "e1feaeda",
   "metadata": {},
   "outputs": [],
   "source": [
    "prompt = ChatPromptTemplate.from_messages(\n",
    "    [\n",
    "        (\"system\", \"You are a helpful assistant\"),\n",
    "        (\"user\", \"{input}\"),\n",
    "        MessagesPlaceholder(variable_name=\"agent_scratchpad\"),\n",
    "    ]\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "d27dc3a8",
   "metadata": {},
   "outputs": [],
   "source": [
    "llm = ChatOpenAI(temperature=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "7bab4af2",
   "metadata": {},
   "outputs": [],
   "source": [
    "llm_with_tools = llm.bind(\n",
    "    functions=[\n",
    "        # The retriever tool\n",
    "        format_tool_to_openai_function(retriever_tool),\n",
    "        # Response schema\n",
    "        convert_pydantic_to_openai_function(Response),\n",
    "    ]\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "b886416c",
   "metadata": {},
   "outputs": [],
   "source": [
    "agent = (\n",
    "    {\n",
    "        \"input\": lambda x: x[\"input\"],\n",
    "        # Format agent scratchpad from intermediate steps\n",
    "        \"agent_scratchpad\": lambda x: format_to_openai_function_messages(\n",
    "            x[\"intermediate_steps\"]\n",
    "        ),\n",
    "    }\n",
    "    | prompt\n",
    "    | llm_with_tools\n",
    "    | parse\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "2cfd783e",
   "metadata": {},
   "outputs": [],
   "source": [
    "agent_executor = AgentExecutor(tools=[retriever_tool], agent=agent, verbose=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9f114fec",
   "metadata": {},
   "source": [
    "## Run the agent\n",
    "\n",
    "We can now run the agent! Notice how it responds with a dictionary with two keys: `answer` and `sources`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "2667c9a4",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
      "\u001b[32;1m\u001b[1;3m\u001b[0m\u001b[36;1m\u001b[1;3m[Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', metadata={'page_chunk': 31, 'source': '../../state_of_the_union.txt'}), Document(page_content='One was stationed at bases and breathing in toxic smoke from “burn pits” that incinerated wastes of war—medical and hazard material, jet fuel, and more. \\n\\nWhen they came home, many of the world’s fittest and best trained warriors were never the same. \\n\\nHeadaches. Numbness. Dizziness. \\n\\nA cancer that would put them in a flag-draped coffin. \\n\\nI know. \\n\\nOne of those soldiers was my son Major Beau Biden. \\n\\nWe don’t know for sure if a burn pit was the cause of his brain cancer, or the diseases of so many of our troops. \\n\\nBut I’m committed to finding out everything we can. \\n\\nCommitted to military families like Danielle Robinson from Ohio. \\n\\nThe widow of Sergeant First Class Heath Robinson.  \\n\\nHe was born a soldier. Army National Guard. Combat medic in Kosovo and Iraq. \\n\\nStationed near Baghdad, just yards from burn pits the size of football fields. \\n\\nHeath’s widow Danielle is here with us tonight. They loved going to Ohio State football games. He loved building Legos with their daughter.', metadata={'page_chunk': 37, 'source': '../../state_of_the_union.txt'}), Document(page_content='A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since she’s been nominated, she’s received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \\n\\nAnd if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \\n\\nWe can do both. At our border, we’ve installed new technology like cutting-edge scanners to better detect drug smuggling.  \\n\\nWe’ve set up joint patrols with Mexico and Guatemala to catch more human traffickers.  \\n\\nWe’re putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \\n\\nWe’re securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.', metadata={'page_chunk': 32, 'source': '../../state_of_the_union.txt'}), Document(page_content='But cancer from prolonged exposure to burn pits ravaged Heath’s lungs and body. \\n\\nDanielle says Heath was a fighter to the very end. \\n\\nHe didn’t know how to stop fighting, and neither did she. \\n\\nThrough her pain she found purpose to demand we do better. \\n\\nTonight, Danielle—we are. \\n\\nThe VA is pioneering new ways of linking toxic exposures to diseases, already helping more veterans get benefits. \\n\\nAnd tonight, I’m announcing we’re expanding eligibility to veterans suffering from nine respiratory cancers. \\n\\nI’m also calling on Congress: pass a law to make sure veterans devastated by toxic exposures in Iraq and Afghanistan finally get the benefits and comprehensive health care they deserve. \\n\\nAnd fourth, let’s end cancer as we know it. \\n\\nThis is personal to me and Jill, to Kamala, and to so many of you. \\n\\nCancer is the #2 cause of death in America–second only to heart disease.', metadata={'page_chunk': 38, 'source': '../../state_of_the_union.txt'})]\u001b[0m\u001b[32;1m\u001b[1;3m{'name': 'Response', 'arguments': '{\\n  \"answer\": \"President mentioned Ketanji Brown Jackson as a nominee for the United States Supreme Court and praised her as one of the nation\\'s top legal minds.\",\\n  \"sources\": [31]\\n}'}\u001b[0m\n",
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "{'answer': \"President mentioned Ketanji Brown Jackson as a nominee for the United States Supreme Court and praised her as one of the nation's top legal minds.\",\n",
       " 'sources': [31]}"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "agent_executor.invoke(\n",
    "    {\"input\": \"what did the president say about kentaji brown jackson\"},\n",
    "    return_only_outputs=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b355665e",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
